Study of embedded font context and kernel space methods for improved videotext recognition
نویسندگان
چکیده
Videotext refers to text superimposed on video frames. A videotext based Multimedia Description Scheme has recently been adopted into the MPEG-7 standard. A study of published work in the area of videotext extraction and recognition reveals that, despite recent interest, a reliable general purpose video character recognition (VCR) system is yet to be developed. In our research and development of a character recognition algorithm designed specifically for the low resolution output from automatic videotext extractors, we observed that raw VCR accuracies obtained using various classifiers including kernel space methods such as SVMs, are inadequate for accurate video annotation and browsing. Intelligent postprocessing mechanisms that are supported by general data characteristics of the domain are hence, required for performance improvement. We describe one such method, referred to as the Font Context Analysis, which works independently of the raw character recognition technique. As a result, it can be easily implemented in conjunction with other VCR algorithms being developed elsewhere, and offer the same performance gains. Experimental results on various video streams show notable improvements in recognition rates with our system incorporating a SVM-based character recognition mechanism and font context analysis.
منابع مشابه
بهبود مدل تفکیککننده منیفلدهای غیرخطی بهمنظور بازشناسی چهره با یک تصویر از هر فرد
Manifold learning is a dimension reduction method for extracting nonlinear structures of high-dimensional data. Many methods have been introduced for this purpose. Most of these methods usually extract a global manifold for data. However, in many real-world problems, there is not only one global manifold, but also additional information about the objects is shared by a large number of manifolds...
متن کاملA Bayesian Framework for Fusing Multiple Word Knowledge Models in Videotext Recognition
Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correction, but recognition using multi-modality language models has not been explored. In this paper, we present a formal Bayesian framework for videotext recognition by combining multipl...
متن کاملSearch Space Reduction for Farsi Printed Subwords Recognition by Position of the Points and Signs
In the field of the words recognition, three approaches of words isolation, the overall shape and combination of them are used. Most optical recognition methods recognize the word based on break the word into its letters and then recogniz them. This approach is faced some problems because of the letters isolation dificulties and its recognition accurcy in texts with a low image quality. Therefo...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملA comparative study of two kernel eigenspace-based speaker adaptation methods on large vocabulary continuous speech recognition
Eigenvoice (EV) speaker adaptation has been shown effective for fast speaker adaptation when the amount of adaptation data is scarce. In the past two years, we have been investigating the application of kernel methods to improve EV speaker adaptation by exploiting possible nonlinearity in the speaker space, and two methods were proposed: embedded kernel eigenvoice (eKEV) and kernel eigenspace-b...
متن کامل